Goto

Collaborating Authors

 engineering application


Evaluating Hydro-Science and Engineering Knowledge of Large Language Models

Hu, Shiruo, Shan, Wenbo, Li, Yingjia, Wan, Zhiqi, Yu, Xinpeng, Qi, Yunjia, Xia, Haotian, Xiao, Yang, Liu, Dingxiao, Wang, Jiaru, Gong, Chenxu, Zhang, Ruixi, Wu, Shuyue, Cui, Shibo, Lai, Chee Hui, Luo, Wei, He, Yubin, Xu, Bin, Zhao, Jianshi

arXiv.org Artificial Intelligence

Hydro-Science and Engineering (Hydro-SE) is a critical and irreplaceable domain that secures human water supply, generates clean hydropower energy, and mitigates flood and drought disasters. Featuring multiple engineering objectives, Hydro-SE is an inherently interdisciplinary domain that integrates scientific knowledge with engineering expertise. This integration necessitates extensive expert collaboration in decision-making, which poses difficulties for intelligence. With the rapid advancement of large language models (LLMs), their potential application in the Hydro-SE domain is being increasingly explored. However, the knowledge and application abilities of LLMs in Hydro-SE have not been sufficiently evaluated. To address this issue, we propose the Hydro-SE LLM evaluation benchmark (Hydro-SE Bench), which contains 4,000 multiple-choice questions. Hydro-SE Bench covers nine subfields and enables evaluation of LLMs in aspects of basic conceptual knowledge, engineering application ability, and reasoning and calculation ability. The evaluation results on Hydro-SE Bench show that the accuracy values vary among 0.74 to 0.80 for commercial LLMs, and among 0.41 to 0.68 for small-parameter LLMs. While LLMs perform well in subfields closely related to natural and physical sciences, they struggle with domain-specific knowledge such as industry standards and hydraulic structures. Model scaling mainly improves reasoning and calculation abilities, but there is still great potential for LLMs to better handle problems in practical engineering application. This study highlights the strengths and weaknesses of LLMs for Hydro-SE tasks, providing model developers with clear training targets and Hydro-SE researchers with practical guidance for applying LLMs.


Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing

Akgün, Onur

arXiv.org Artificial Intelligence

The coordination of multiple autonomous agents in high-speed, competitive environments represents a significant engineering challenge. This paper presents CRUISE (Curriculum-Based Iterative Self-Play for Scalable Multi-Drone Racing), a reinforcement learning framework designed to solve this challenge in the demanding domain of multi-drone racing. CRUISE overcomes key scalability limitations by synergistically combining a progressive difficulty curriculum with an efficient self-play mechanism to foster robust competitive behaviors. Validated in high-fidelity simulation with realistic quadrotor dynamics, the resulting policies significantly outperform both a standard reinforcement learning baseline and a state-of-the-art game-theoretic planner. CRUISE achieves nearly double the planner's mean racing speed, maintains high success rates, and demonstrates robust scalability as agent density increases. Ablation studies confirm that the curriculum structure is the critical component for this performance leap. By providing a scalable and effective training methodology, CRUISE advances the development of autonomous systems for dynamic, competitive tasks and serves as a blueprint for future real-world deployment.


A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis

Liu, Jiachen, Geng, Ziheng, Cao, Ran, Cheng, Lu, Bocchini, Paolo, Cheng, Minghui

arXiv.org Artificial Intelligence

Large language models (LLMs) have exhibited remarkable capabilities across diverse open-domain tasks, yet their application in specialized domains such as civil engineering remains largely unexplored. This paper starts bridging this gap by evaluating and enhancing the reliability and robustness of LLMs in structural analysis of beams. Reliability is assessed through the accuracy of correct outputs under repetitive runs of the same problems, whereas robustness is evaluated via the performance across varying load and boundary conditions. A benchmark dataset, comprising eight beam analysis problems, is created to test the Llama-3.3 70B Instruct model. Results show that, despite a qualitative understanding of structural mechanics, the LLM lacks the quantitative reliability and robustness for engineering applications. To address these limitations, a shift is proposed that reframes the structural analysis as code generation tasks. Accordingly, an LLM-empowered agent is developed that (a) integrates chain-of-thought and few-shot prompting to generate accurate OpeeSeesPy code, and (b) automatically executes the code to produce structural analysis results. Experimental results demonstrate that the agent achieves accuracy exceeding 99.0% on the benchmark dataset, exhibiting reliable and robust performance across diverse conditions. Ablation studies highlight the complete example and function usage examples as the primary contributors to the agent's enhanced performance.


Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study

Youwai, Sompote, Phim, David, Murcia, Vianne Gayl, Onas, Rianne Clair

arXiv.org Artificial Intelligence

This study investigates router-based multi-agent systems for automating foundation design calculations through intelligent task classification and expert selection. Three approaches were evaluated: single-agent processing, multi-agent designer-checker architecture, and router-based expert selection. Performance assessment utilized baseline models including DeepSeek R1, ChatGPT 4 Turbo, Grok 3, and Gemini 2.5 Pro across shallow foundation and pile design scenarios. The router-based configuration achieved performance scores of 95.00% for shallow foundations and 90.63% for pile design, representing improvements of 8.75 and 3.13 percentage points over standalone Grok 3 performance respectively. The system outperformed conventional agentic workflows by 10.0 to 43.75 percentage points. Grok 3 demonstrated superior standalone performance without external computational tools, indicating advances in direct LLM mathematical reasoning for engineering applications. The dual-tier classification framework successfully distinguished foundation types, enabling appropriate analytical approaches. Results establish router-based multi-agent systems as optimal for foundation design automation while maintaining professional documentation standards. Given safety-critical requirements in civil engineering, continued human oversight remains essential, positioning these systems as advanced computational assistance tools rather than autonomous design replacements in professional practice.


Enhancing people localisation in drone imagery for better crowd management by utilising every pixel in high-resolution images

Ptak, Bartosz, Kraft, Marek

arXiv.org Artificial Intelligence

Accurate people localisation using drones is crucial for effective crowd management, not only during massive events and public gatherings but also for monitoring daily urban crowd flow. Traditional methods for tiny object localisation using high-resolution drone imagery often face limitations in precision and efficiency, primarily due to constraints in image scaling and sliding window techniques. To address these challenges, a novel approach dedicated to point-oriented object localisation is proposed. Along with this approach, the Pixel Distill module is introduced to enhance the processing of high-definition images by extracting spatial information from individual pixels at once. Additionally, a new dataset named UP-COUNT, tailored to contemporary drone applications, is shared. It addresses a wide range of challenges in drone imagery, such as simultaneous camera and object movement during the image acquisition process, pushing forward the capabilities of crowd management applications. A comprehensive evaluation of the proposed method on the proposed dataset and the commonly used DroneCrowd dataset demonstrates the superiority of our approach over existing methods and highlights its efficacy in drone-based crowd object localisation tasks. These improvements markedly increase the algorithm's applicability to operate in real-world scenarios, enabling more reliable localisation and counting of individuals in dynamic environments.


SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation

Termritthikun, Chakkrit, Umer, Ayaz, Suwanwimolkul, Suwichaya, Xia, Feng, Lee, Ivan

arXiv.org Artificial Intelligence

Recent advancements in deep convolutional neural networks have significantly improved the performance of saliency prediction. However, the manual configuration of the neural network architectures requires domain knowledge expertise and can still be time-consuming and error-prone. To solve this, we propose a new Neural Architecture Search (NAS) framework for saliency prediction with two contributions. Firstly, a supernet for saliency prediction is built with a weight-sharing network containing all candidate architectures, by integrating a dynamic convolution into the encoder-decoder in the supernet, termed SalNAS. Secondly, despite the fact that SalNAS is highly efficient (20.98 million parameters), it can suffer from the lack of generalization. To solve this, we propose a self-knowledge distillation approach, termed Self-KD, that trains the student SalNAS with the weighted average information between the ground truth and the prediction from the teacher model. The teacher model, while sharing the same architecture, contains the best-performing weights chosen by cross-validation. Self-KD can generalize well without the need to compute the gradient in the teacher model, enabling an efficient training system. By utilizing Self-KD, SalNAS outperforms other state-of-the-art saliency prediction models in most evaluation rubrics across seven benchmark datasets while being a lightweight model. The code will be available at https://github.com/chakkritte/SalNAS


Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions

Gupta, Shivam, Tarushi, null, Wangzes, Tsering, Jain, Shweta

arXiv.org Artificial Intelligence

The rapid growth of data from edge devices has catalyzed the performance of machine learning algorithms. However, the data generated resides at client devices thus there are majorly two challenge faced by traditional machine learning paradigms - centralization of data for training and secondly for most the generated data the class labels are missing and there is very poor incentives to clients to manually label their data owing to high cost and lack of expertise. To overcome these issues, there have been initial attempts to handle unlabelled data in a privacy preserving distributed manner using unsupervised federated data clustering. The goal is partition the data available on clients into $k$ partitions (called clusters) without actual exchange of data. Most of the existing algorithms are highly dependent on data distribution patterns across clients or are computationally expensive. Furthermore, due to presence of skewed nature of data across clients in most of practical scenarios existing models might result in clients suffering high clustering cost making them reluctant to participate in federated process. To this, we are first to introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients. We propose p-FClus that addresses these goal in a single round of communication between server and clients. We validate the efficacy of p-FClus against variety of federated datasets showcasing it's data independence nature, applicability to any finite $\ell$-norm, while simultaneously achieving lower cost and variance.


Physics-Enhanced Machine Learning: a position paper for dynamical systems investigations

Cicirello, Alice

arXiv.org Artificial Intelligence

This position paper takes a broad look at Physics-Enhanced Machine Learning (PEML) -- also known as Scientific Machine Learning -- with particular focus to those PEML strategies developed to tackle dynamical systems' challenges. The need to go beyond Machine Learning (ML) strategies is driven by: (i) limited volume of informative data, (ii) avoiding accurate-but-wrong predictions; (iii) dealing with uncertainties; (iv) providing Explainable and Interpretable inferences. A general definition of PEML is provided by considering four physics and domain knowledge biases, and three broad groups of PEML approaches are discussed: physics-guided, physics-encoded and physics-informed. The advantages and challenges in developing PEML strategies for guiding high-consequence decision making in engineering applications involving complex dynamical systems, are presented.


Comparison of edge computing methods in Internet of Things architectures for efficient estimation of indoor environmental parameters with Machine Learning

Gamazo-Real, Jose-Carlos, Fernandez, Raul Torres, Armas, Adrian Murillo

arXiv.org Artificial Intelligence

The large increase in the number of Internet of Things (IoT) devices have revolutionised the way data is processed, which added to the current trend from cloud to edge computing has resulted in the need for efficient and reliable data processing near the data sources using energy-efficient devices. Two methods based on low-cost edge-IoT architectures are proposed to implement lightweight Machine Learning (ML) models that estimate indoor environmental quality (IEQ) parameters, such as Artificial Neural Networks of Multilayer Perceptron type. Their implementation is based on centralised and distributed parallel IoT architectures, connected via wireless, which share commercial off-the-self modules for data acquisition and sensing, such as sensors for temperature, humidity, illuminance, CO2, and other gases. The centralised method uses a Graphics Processing Unit and the Message Queuing Telemetry Transport protocol, but the distributed method utilises low performance ARM-based devices and the Message Passing Interface protocol. Although multiple IEQ parameters are measured, the training and testing of ML models is accomplished with experiments focused on small temperature and illuminance datasets to reduce data processing load, obtained from sudden spikes, square profiles and sawteeth test cases. The results show a high estimation performance with F-score and Accuracy values close to 0.95, and an almost theorical Speedup with a reduction in power consumption close to 37% in the distributed parallel approach. In addition, similar or slightly better performance is achieved compared to equivalent IoT architectures from related research, but error reduction of 35 to 76% is accomplished with an adequate balance between performance and energy efficiency.


Efficient Propagation of Uncertainty via Reordering Monte Carlo Samples

Khatamsaz, Danial, Attari, Vahid, Arroyave, Raymundo, Allaire, Douglas L.

arXiv.org Artificial Intelligence

Uncertainty analysis in the outcomes of model predictions is a key element in decision-based material design to establish confidence in the models and evaluate the fidelity of models. Uncertainty Propagation (UP) is a technique to determine model output uncertainties based on the uncertainty in its input variables. The most common and simplest approach to propagate the uncertainty from a model inputs to its outputs is by feeding a large number of samples to the model, known as Monte Carlo (MC) simulation which requires exhaustive sampling from the input variable distributions. However, MC simulations are impractical when models are computationally expensive. In this work, we investigate the hypothesis that while all samples are useful on average, some samples must be more useful than others. Thus, reordering MC samples and propagating more useful samples can lead to enhanced convergence in statistics of interest earlier and thus, reducing the computational burden of UP process. Here, we introduce a methodology to adaptively reorder MC samples and show how it results in reduction of computational expense of UP processes.